Fix location_table access pattern#27
Conversation
|
Using current OTel eBPF profiler against this PR results in stacktraces that looks like this: From the mix of order of frames, we can tell that this is not a correct representation of a stacktrace. |
|
Hi @florianl, thank you for checking. It does not look good indeed, do you see the mistake in my reasoning though? That's how I interpret the spec, and I think @rockdaboot agrees. There's some discussion about this in open-telemetry/opentelemetry-ebpf-profiler#534, maybe the two things are related. |
|
I suggest to do things in this order:
If done correctly, receiving and displaying the traces should just work. |
|
Closing this PR as the bug originates the source and will be fixed with open-telemetry/opentelemetry-ebpf-profiler#538. |
|
@florianl are you sure about this? I think the proposed diff is still relevant, even after the fix in ebpf-profiler. |
|
Yes, as investigations like #27 (comment) showed, the bug remains in the source rather than the backend. |
|
As far as I understand the code here, there are two bugs. One was in ebpf-profiler and the other one is still in devfiler, which doesn't take the second indirection into account. |
|
@fandreuz I reopened this PR as this is still an issue, devfiler currently assumes that locations are stored in a linear way (non-deduplicated) and breaks if that's not the case which is also a protocol violation. The work I did in this PR to enable |
|
Hi @christos68k, thanks for reopening this. I'm on holiday and won't be able to work on this issue before next week. If it's urgent someone else will have to pick it up, I'll be happy to do it later though. |
No worries, I tested it and it works fine for me but feel free to also take a look when you come back. |
florianl
left a comment
There was a problem hiding this comment.
Along with open-telemetry/opentelemetry-ebpf-profiler#550 I looked more into this change as well. I was wrong and it seems like I got tricked by the Rust build cache.
|
Asking again, can we also have unit tests with this PR, please? |
|
Hi @rockdaboot, happy to add some tests to this PR. However, I don't see any unit test in this project ( |
Tbh, I am not a rust expert, but would assume doing tests like at devfiler/src/storage/tables/stacktraces.rs Line 165 in 0648a45 Just for the background: |
|
To enable test runs after the Nix build (and thus also in CI), you should just need to flip the boolean here: Line 94 in d181b97 If that doesn't work, just lmk and I'd be happy to take a look. |
|
Merging this PR as it affects multiple people. Tests still can be added in a separate PR. Thanks @fandreuz for identifying and fixing the issue 🙏 |

As discussed on Slack, I propose a small fix for the location access pattern. According to the spec, the range designated by a sample should first check the indices in
Profile.location_indices.I'd expect the following:
but devfiler currently does:
I spotted this while testing devfiler on Async-Profiler OTLP profiles, I'm getting index out of bounds error for location indices. This is also related to open-telemetry/opentelemetry-ebpf-profiler#534.